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Abstract: Wc consider projection methods for the estimation of the cumu- 
lative distribution function under interval censoring, case 1. Such censored 
data also known as current status data, arise when the only information 
available on the variable of interest is whether it is greater or less than 
an observed random time. Two types of adaptive estimators are investi- 
gated. The first one is a two-step estimator built as a quotient estimator. 
The second estimator results from a mean square regression contrast. Both 
estimators are proved to achieve automatically the standard optimal rate 
associated with the unknown regularity of the function, but with some re- 
striction for the quotient estimator. Simulation experiments are presented 
to illustrate and compare the methods. 
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1. Introduction 

Let X be a survival time with unknown cumulative distribution function (cdf) 
F . In the interval censoring case 1 model, we are not able to observe the sur- 
vival time X. Instead, an observation consists of the pair (U, 6) where U is an 
examination time and 8 is the indicator function of the event (X < U). Roughly 
speaking, the only knowledge about the variable of interest X is wether it has oc- 
curred before U or not. Early examples of such interval censoring can be found in 
demography studies, sec Diamond and McDonald (1991). In epidemiology, these 
censoring schemes also arise for instance in AIDS studies or more generally in 
the study of infectious diseases when the infection time is an unobservable event. 
We assume that U is independent of X, that F has density / and that the cdf 
G of U has density g. Such data, also known as current status data, may remind 
us right-censored data where the observed data is the pair (mm(X 7 C), I(x<C)) 
where C is a censoring variable. However, the estimation procedure in these 
two censoring models is substantially different. In the right-censoring model, 
the Kaplan and Meier (1958) estimator is well studied and is asymptotically 
normal at the rate y/n. Nevertheless, current status data have been studied by 
many authors in the last two decades, see Jewell and van der Laan (2004) for a 
state of the art. In the interval censoring model, the nonparametric maximum 
likelihood estimator (NPMLE) of the survival function is proved to be uniformly 
consistent, pointwisc convergent to a nonnormal asymptotic distribution at the 
rate n 1 / 3 in Groeneboom and Wellner (1992). In van de Geer (1993), it is also 
established that the NPMLE converges at rate n -1 / 3 in L 2 -norm. 

Recent extensions take two directions. First, more general contexts are con- 
sidered. For example, van der Vaart and van der Laan (2006) build nonpara- 
metric estimates of the survival function for current status data in presence of 
time dependent and high dimensional covariates: they provide limit central the- 
orems with rate n 1//3 and nonstandard limiting processes. The second direction 
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aims at proposing smooth estimates that may take into account the possible 
smoothness of the survival function. Indeed, the NPMLE estimator is a piece- 
wise constant function. The locally linear smoother proposed by Yang (2000), 
contrary to the NPMLE may be non monotone, but it has a better convergence 
rate than the NPMLE when the density / is smooth and the kernel function 
and the bandwidth are properly chosen. In the same spirit, Ma and Kosorok 
(2006) introduce an adaptive modified penalized least square estimator built 
with smoothing splines but their main objective is the study of scmiparamctric 
models. They have in mind the same type of penalization device that we present 
here, but their penalty functions contain many complicated terms that would 
be difficult to estimate. 

Here, we also pursue the search for smooth (or piecewise smooth) adaptive 
estimators. We present two different penalized minimum contrast estimators 
built on trigonometric, polynomial or wavelet spaces whose associated penalty 
terms are really simple; the minimization of the penalized contrast function al- 
lows to choose a space that leads to both a non asymptotic automatic squared 
bias/variance compromise and to an asymptotic optimal convergence rate ac- 
cording to the regularity of the function F in term of Besov spaces. An inter- 
esting feature of the procedure is that the estimators and their study is made 
straightforward by the most powerful Talagrand (1996) inequality for empiri- 
cal centered processes. We also use technical properties proved in a regression 
framework by Baraud et al. (2001) and Baraud. (2002) for the mean-square es- 
timator. Globally, the available tools and algorithms for adaptive density and 
regression estimation make our solution easy to study and to implement. 

The plan of the paper is as follows. Section 2 introduces the quotient and 
the regression estimators, after the description of the lifetimes model. We also 
give a detailed description of the projection spaces with their main properties. 
Then, we study one projection estimator of the density of the failure times which 
have occurred before the examination time in Section 3. Both convergence and 
adaptation results are given. This estimator is then applied to the estimation 
of the cumulative distribution function via a quotient construction. Section 4 
describes a direct adaptive procedure to estimate the distribution function based 
on a mean square regression contrast. Simulations compare both approaches in 
Section 5. We use as a benchmark the NPMLE and also the simple piecewise 
constant estimator proposed by Birge (1999). Lastly, most proofs and technical 
lemmas are deferred to Section 6. 

2. Definition of the estimators 
2.1. Model and assumptions 

Let (f/i, 6%), ■ ■ ■ (U n , S n ) be a sample of the pair (U, 5) where Si = Ipq<i/») an d 
the sequences (£7i)i<;< n and (Xi)i<i< n arc independent. We are interested in 
the estimation of the distribution function F of the lifetime X on a compact 
set A only. By rescaling the data, we take A = [0, 1] without loss of generality 
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The compact set is considered as fixed in the theory, even if in practice, it is 
determined from the data. Remember that we denote by / and F the density 
and the cumulative distribution function of the unobserved lifetime X and g 
and G those of the examination time U . A function of interest is the density ip 
of the Ui restricted to the individuals for which Si = 1 defined by: 

1>{x)=F{x)g{x) (2.1) 

It is clear that this equation provides a way to build an estimator of F. This 
approach is developed in Section 3. The censoring mechanism is such that the 
conditional law of 5 = \x<u) given U — u is a Bernoulli law with parameter 
F{u) and as a consequence we have: 

E{S\U = u) = F(u) (2.2) 

This relation will lead to define a direct mean-square estimator of F. 
Both strategies require the following assumption: 

[Al] The density g of the random time U is lower and upper bounded on A so 
that there exist real finite constants go > and g\ > such that for all 
x E A, g < g(x) < g x . 

2.2. Definition of the estimators 

Assume that we have at our disposal a collection of finite dimensional spaces of 
functions, denoted by (S m ) m ^M n , satisfying the following assumption: 

{Til) (S m ) m £M n i s a collection of finite-dimensional linear sub-spaces of L 2 ([0, 1]), 
with dimension dim(S' m ) = D m such that D m < n, Vm G M. n an d satis- 
fying: 

3$ > 0,Vm G M n yt G S m , |]*||oo < <Z>o^/DZ\\t\\. (2.3) 
where ||t|| 2 = J* t 2 (x)dx, for t in L 2 ([0, 1]). 

2.2.1. Quotient estimator 

As already mentioned, the first strategy requires to estimate ip and g. The 
estimator g of g is chosen as the adaptive density estimator defined in Massart 
(2007), Chapter 7, namely: g = g rng where g m = argmin teSm 7#(£), 

„ n 

7^)HI*II 2 --E^)' 

i=l 

and 

m ff = arg mm ^{g m ) + pen (m). (2.4) 
with pen ff (m) = K<S>oD m /n. 
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For the estimation of ij), we consider the following contrast function 

lt(t)^\\t\\ 2 --J2W l )- (2-5) 

i=l 

Let then 

i/> m = arg min j$(t). (2.6) 

Then we define ift = ip m where 

m = arg min Yln$m) + pen^(m)]. 

The penalty function will be motivated and defined later. The contrasts and 
7* are both found as empirical versions of the L 2 distance between a function 
t in S m and the function of interest (g or ij)). To see this, take the expectation 
of e.g. 7^: 

nitW = M 2 = ¥~n 2 -H\\ 2 

with (t, s) = J t(x)s(x)dx. This illustrates that minimizing 7^ is likely to provide 
a function t that minimizes in mean \\t — ?/>|| 2 and thus estimate i/j, on the space 

Now, the adaptive estimators ip of ip and g of g are defined, and we can use 
Equality (2.1) to build a quotient estimator of the distribution function F by 
setting 

ii^(x)/~g(x) <0 

if < $(x)/g(x) < 1 (2.7) 
if 4>(x)/g(x) > 1 



F(i) 







,2. ,9. ,2. Regression estimator 

On the other hand, a direct estimator of the cdf F can be obtained by considering 
the following mean-square contrast: 



\t) = -J2[6 i -t(U i )} 2 (2.1 

In this case, we set 



ln n 

i=l 



F m = argmin7 i r i W (2.9) 

in the sense that we always can compute a vector (F m (Ui), . . . , F m (U n )) as the 
orthogonal projection of the vector (Si, ... , S n ) on the sub-space of R n defined 
by {(t(E/i), . . .,t(U n )),te S m }. Then we define F Ao by: 

m = arg min { 7 f s (F m ) + pen MS (m)}, (2.10) 

m6A1„ 

with 

pen MS (m) - k — . (2.11) 
where kq is a numerical constant. 
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2.2.3. Remark about the NPMLE estimator 

In the above setting, it is conceivable to consider the log-likelihood contrast 
7 f LE (t) = (1/n) £ti ^ log(t(Di)) + (1 - <5.) log(l - t(Ui))). If t is supposed 
to be a piecewise constant function with jumps only at the observed points, 
then the NPMLE F n which maximizes r ) n vILE can be obtained by the max-min 
formula, see Jewell and van der Laan (2004): 

F n (U {i} ) = nraxmin ^^ (2.12) 

v ' 3<i k>i k — J + 1 

Most results are essentially of asymptotic nature for the NPMLE as already 
mentioned. Nevertheless, it is of interest for setting the benchmark to include 
it in our simulation study, see Section 5. The advantage is that no adaptation 
is required, but the NPMLE is a piecewise constant function. Now, another ap- 
proach is to consider the histogram-type estimator introduced by Birge (1999). 
Let (Ij)i<j<D = {[ a j-i! a 'j[)i<j<D be a partition of [0, 1] and let us consider a 
piecewise constant function t — X^=i a j^ij ■ ^ onc looks for such a function that 
maximizes the contrast 7 ^ L£; , one nn ds the estimator given by Birge (1999): 

f d = Ef=i of LEl h wi * h 

1 n n 
J ' i=l i=l 

and af LE = otherwise for j = 1, . . . , D. If D is of order ji 1 / 3 , Birge (1999) 
gives weak assumptions ensuring that the L 1 -risk ^{Jq \Fo(x) — F(x)\dx) is of 
order n -1 / 3 . A thinner model selection strategy may be developed to take a 
possible higher regularity of F into account. 

But the contrasts proposed here have the advantage that the empirical pro- 
cesses to be controlled are linear with respect to the functions t, a property that 
the NPMLE estimator would not share. This would make the theoretical study 
more technical, and the estimation algorithms difficult to implement for general 
bases. Moreover, Hellinger-type risk would have to be considered, in the same 
way as in Birge and Rozenholc (2006). This explains why we rather consider the 
contrasts 7^ and 7^ . 

Before studying both estimators, let us give some examples of collections 
2.3. Spaces of approximation 

The main assumption is described by (Hi). In this setting, an orthonormal basis 
of S m is denoted by (<^a) AGA m where |A TO | = D m . Let us mention that it follows 
from Birge and Massart (1997) that Property (2.3) in the context of (7ii) is 
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equivalent to 

3$ >0,|| VaIIoc <$oA„. (2.13) 

AeA m 

Moreover, for some results we need the following additional assumption: 

(H2) (S m )meM n is a collection of nested models, wc denote by S n the space 
belonging to the collection, such that Vto £ M n , S m C S n - We denote by 
N n the dimension of this nesting space: dim(<S„) = N n (Vm £ M n ,D m < 
N n ). 

We consider more precisely the following examples: 

[T ] Trigonometric spaces: S m is generated by { 1, \[2 cos(27rja;), -\/2 sin(27rja:) 

for j = 1, . . . ,m }, D m = 2to + 1 and M n = {1, . . . , [n/2] - 1}. 
[P] Regular piecewise polynomial spaces: S m is generated by m(r + 1) poly- 
nomials, r + 1 polynomials of degree 0, 1, . . . , r on each subintcrval [(j — 
l)/m, j/m], for j = 1, . . . to, D m = (r + l)m, to g A^n = {1, 2, . . . , [n/(r + 
1)]}. For example, consider the orthogonal collection in L 2 ([— 1, 1]) of Lcg- 
endre polynomials Qfc, where the degree of Qk is equal to k, \Qk(x)\ < 
l,Vx G [-1,1], Qfe(l) = 1 and /^QKuJdu = 2/(2fc + 1). Then the 
orthonormal basis is given by ipj^{x) = y/m(2k + l)Qk(2mx — 2j + 
l)I[(j_i)/ mj j/ m [(x) for j = 1, . . . , to and k = 0,...,r, with D m = (r+l)m. 
In particular, the histogram basis corresponds to r = and is simply de- 
fined by ipj(x) = y/D m /D m ,j/D m ] ( x ) an d fm = m - We denote by 
[DP] the collection of piecewise polynomials corresponding to dyadic sub- 
divisions with to = 2 q and D rn = (r + 1) 2 q . 
[W] Dyadic wavelet generated spaces with regularity r and compact support, 
as described e.g. in Donoho and Johnstone (1998). 

All those spaces satisfy (Hi), with for instance <!>o = V% for collection [T] 
and $ = a/2t + 1 for collection [P]. Moreover, [T], [DP] and [W] satisfy (H 2 ) 
since they are nested with S n being the space with the largest dimension in the 
collection. 

3. Study of the quotient estimator 

Our aim is to estimate the cdf F from the observations (Si, Ui), i = 1, . . . , n. 
3.1. Convergence results for one estimator 

An explicit expression of the estimator follows from definition (2.5)-(2.6) by 
using the orthonormal basis (<^A)AeA,„ of (S m ) described in (Hi): 

1 " 

V> m = ^ «a¥?a with a x = - y]Sj(px{Ui). (3.1) 
AeA m n i=l 
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Wo define also </•„, as the orthogonal projection of if) on S m . We can write 

ipm = X] axipx with ° A = / Vxi^i^dx. (3.2) 
AeA m 70 

The rate of the estimator -0 m of ip is quite easy to derive. Indeed, it follows 
from (3.1), (3.2) and Pythagoras theorem that 



\\i> ~ i> m \\ 2 = ||V^ - V'rnli 2 + - -0m|| 2 = li^A - V'rnH 2 + (OA - OA 

AGA m 

= ||V^-<M 2 + V -V^ff/i)- [ iP(x)<p x (x)dx) 

A£A m \ n i=l ^ / 

Therefore 



GA m \ i=l / 
\iP~iP m \\ 2 + - V Var^^A^i)) 

rj •<- — ^ 



AeA„ 



< ||V-^m|| 2 + -E 



£ ^(imUi^i) 



vAeA,, 



< ||^-^ m || 2 + M^ E(5lI ) 

with (2.13). This can be summarized by the following Proposition: 

Proposition 3.1. Consider the model described in Section 2.1 and the esti- 
mator ip m = argmin tg s m 7n (0 where 7^(i) is defined by (2.5) and S m is a 
D m - dimensional linear space in a collection satisfying (Hi). Then 

nu - v-mii 2 ) < u - vu 2 + ^^Efoi )). ( 3.3) 

n ~ 

Inequality (3.3) gives the asymptotic rate for one estimator if we consider that 
ip belongs to a Besov space S Ql/ , jPjOO ([0, 1]) with finite Besov norm denoted by 
IV'U^.p- For a precise definition of those notions we refer to DeVore and Lorentz 
(1993) Chapter 2, Section 7, where it is also proved that S Ql(l! p jOO ([0, 1]) C 
Ba^ 2 oo([0, 1]) for p > 2. This justifies that we now restrict our attention to 
B^Aoo([0,l]). 

Then the following (standard) rate is obtained: 

Corollary 3.1. Consider the model described in Section 2.1 and the estima- 
tor ip m = argmin te s m -f%(t) where is defined by (2.5) and S m is a D m - 
dimensional linear space in collection [T], [P], or [W]. Assume moreover that ip 
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belongs to Sq^,,2,oo([0, 1]) with r > > and choose a model with m = m n 
such that D mn = 0(n 1 / ( - 2a f+ 1 ^), then 

E(||V-^„|| 2 )=0(n" 5s ^ T ). (3.4) 

Remark 3.1. The bound r stands for the regularity of the basis functions for 
collections [P] and [W]. For the trigonometric collection [T], no upper bound 
for the unknown regularity is required. 

Proof. The result is a straightforward consequence of the results of 
DeVore and Lorentz (1993) and of Lemma 12 of Barron et al. (1999), which 
imply that \\ip — ip m \\ is of order Dm* in the three collections [T], [P] and [W], 
for any positive ct^. Thus the minimum order in (3.3) is reached for a model 
S mn with D mn = 0([n 1 /( 1+2Q '*)]), which is less than n for > 0. Then, if 
ip £ 160^,2,00 ([0, 1]) for some > 0, we find the standard nonparametric rate 
of convergence n - 2a ^ / i 1 + 2a i>) _ □ 

3.2. Adaptive estimator of the density ip 

The penalized estimator is defined in order to ensure an automatic choice of 
the dimension. Indeed, it follows from Corollary 3.1 that the optimal dimension 
depends on the unknown regularity of the function to be estimated in the 
asymptotic setting and more generally on the unknown constants involved in 
the squared-bias/variance terms. Then we define 

m = arg min bn(V> m ) + pen v '(m)] 

meM n 

where the penalty function pen.^ is determined in order to lead to the choice of 
a "good" model. First, we apply some Talagrand (1996) type inequality to the 
linear empirical process defined by 

1 ™ 

v n {t) := - J2(5it(Ui)- {t,i>)). (3.5) 
i=l 

Then, by using the decomposition of the contrast given by 

itit) - lt{s) = ||t - VII 2 - \\s i>\\ 2 2v n {t - s), (3.6) 
we easily derive the following result: 

Theorem 3.1. Consider the model described in Section 2.1 and the estima- 
tor ip m = arg mint 6 s m it, if) where it{t) is defined by (2.5) and S m is a D m - 
dimensional linear space in a collection satisfying (Hi) and (0.2)- Then the 
estimator ipm with rh defined by 

rh = arg min [7^ (4>m) + pen^(m)] 
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and 



pen^m) > (J 



ij){x)dx 



where k is a universal constant, satisfies 



E(||^-^|| 2 )< inf (3||V- Vm|| 2 + 4pcn^(™)) + -, (3.7) 

where C is a constant depending on $o and on ip(x)dx. 

As it is clear from Theorem 3.1, only a lower bound for the penalty is pro- 
vided. As pen^(.) also appears in the risk bound (3.7), we should not take it 
much larger than the order D m /n because then, the L 2 error would increase and 
the resulting rate would not be the optimal one. On the other hand, no result 
is available for smaller penalties. This explains in particular why it is possible 
to keep the asymptotic rate unchanged when increasing the constant k only. 

Also, if we choose pen^(m) = k$q (j^ ip(x)dx^ (Dm/n), it follows from 
(3.7) that the adaptive estimator automatically makes the squared-bias/ variance 
compromise and from an asymptotic point of view, reaches the optimal rate, pro- 
vided that the constant in the penalty is known. Note that Inequality (3.7) is 
nevertheless non-asymptotic. 

Remark 3.2. In practice, the constant in the penalty, denoted above by K, 
is found by simulation experiments taking into account very different types of 
functions ip. See examples of such a work in Birge and Rozenholc (2006) or 
Comte and Rozenholc (2004). 

The penalty given in Theorem 3.1 cannot be used in practice since it depends 
on the unknown quantity 

l 

ip(x)dx = E((5 1 I (c/l < 1) ). 

A simple solution is to use that tp(x)dx < 1; it follows that Inequality (3.7) 
would hold for a penalty defined by pen^(m) = n^D m /n. This possibly works 
with a resulting over- estimation of the penalty, in a way depending on the un- 
known function ip. The alternative solution is to replace the unknown quantity 
by an estimator (rather than a bound), and to prove that the estimator of ip built 
with this random penalty keeps the adaptation property of the theoretical penal- 
ized estimator. This is described in the following theorem whose proof is omit- 
ted since it is quite the same as the proof of Theorem 3.4 in Brunei and Comte 
(2005). 

Theorem 3.2. Assume that the assumptions of Theorem 3.1 are satisfied. Con- 
sider the estimator ip m with m defined by 

rh = arg min I'Jni'i'm) + pen 1 '' (to)] 

m£JM„ 
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and 



pen (m) 



Dm 

n 



where k is a universal constant, then iprh satisfies 



U-^ m f+^l[ I ^(x)dx)^ 



E(||^- n -^|| 2 ) < inf K 



where Kq is a universal constant and K depends on ip, <&q. 



f , (3.8) 



In particular, we can derive quite straightforwardly from results as Theorem 
3.2 adaptation results to unknown smoothness: 

Proposition 3.2. Consider the collection of models [T], [DP] or [W], with 
r > a.jf, > 0. Assume that an estimator ip of ^ satisfies inequality (3.8) in 
Theorem 3.2 (respectively inequality (3.7) in Theorem 3.1). Let L > 0. Then 



sup <C{oty,L)n 2Q <* +1 

'^6B 0i/j , 2i00 (L) 



(3.9) 



where B a ^,2,oo(^) Sa^,2,oo([0, 1]), \t\ ai ,,2 < L} where C{a^,L) is a con- 

stant depending on a^,,L and also onip, <&q. 



3. 3. Application to the estimation of the distribution function F 

Consider now the first estimator of F, given by (2.7). 

A simple case study allows to see that if tp(x)/g(x) < or ^p{x)/g(x) > 1, 
then \F(x) — F(x)\ < \ip(x)/g(x) — F(x)\, and thus the inequality \F(x) — 
F(x)\ < \ip(x)/g(x) — F(x)\ holds for any x. Also, our definition implies that 
\F(x) — F(x)\ < 1, for any x. Moreover, to exploit [Al], we define 

n g = {lo : g{x) - g(x) > - 9 o/2,V.t e [0, 1]}. 

Then, the following bounds are obtained: 



\F-F\ 



(F(x) - F(x)) 2 dx 



(F(x) - F{x)Ydxl ng + / (F(x) - F{x)Ydxl n 



< 



( ~~~/ \~ —J dxl n a 



la g(x) g(x) 

Thus the first term can de decomposed as follows 



dxlsr- 
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and since g(x) > go/2 on Q g , this yields 

rfM_m\ 2 dxl < fi y ^ „. _ #) . 

For the second, taking the expectation, we use the following Lemma: 

Lemma 3.1. Assume that g £ Sa g ,2.oo([0, 1]) for some a g > 1/2 and consider a 
collection of spaces S m such that log(n) < D m < ypri. Then, under Assumptions 
[Ai] and (TI2), there exists a constant C such that 

P(nj) <P(||5-5||oo >9o/2) < -■ (3.10) 

Finally, by gathering the bounds, we obtain the following proposition: 
Proposition 3.3. Under the assumptions of Lemma 3.1, 

E||F-F|| 2 < S (E||^-V|| 2 +E||g- 9 || 2N ) + C ^^\ (3.11) 
So v In 

where C(go, \\tp\\) is a constant depending on go and \\ip\\- 

From Inequality (3.11), we easily deduce by using results (3.7) or (3.8) that F 
is an adaptive estimator of F if the functions g and tp have the same regularity 
a = ctg = airf. Here we can state the following result: 

Proposition 3.4. Assume that g £ i?a 3 ,2,oo([0, 1]) and that ip £ B a , ,2,oo([0, 1]). 
Consider the collection of models [T], [DP] or [W], with dimensions log(n) < 
D m < \fn and with r > a F = ct^> = Ot g > 1/2. Let F the estimator defined by 
(2.7) and let L > 0. Then 

1 

( sup E||F-F|| 2 V < C(a F ,L)n~^^ (3.12) 

where B QFj2 ,oo(£) = {t £ #0,7,2,00 ([0, 1]), \t\ aF ,2 < L} where C(a F ,L) is a 
constant depending on a Fl L and also on ij), $0 and go- 

Note that Theorem 2 in Yang (2000) shows that the rate in the sup-norm 
over a compact is of order 0{{\nn/n) t ^ 1+ai ^^ 3 ' +2a ^) a.s. where a/ stands for 
the regularity of the density function / (that is a F = ctf + 1)- 

If the index of regularity of F, a F , is greater than the index of regular- 
ity of ip = Fg, a-ip, then the asymptotic rate of the estimator F is given by 
n - Q W( 1 + 2Q W') instead of the optimal one n - a F/( 1 + 2a p) . This is the reason why 
we propose another contrast to estimate directly F . 

4. Study of the mean square estimator 

In this section, we study the mean square estimator of F from (2.9) and its 
adaptive version. In this context, we define the empirical norm | • ||„ as follows: for 
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t <G S m , \\t\\n = (l/n) Y^i=i t 2 (Ui)- It is a natural norm in regression problems, 
and under [Al], it is equivalent in mean to the standard Lebesgue integrated 
L 2 -norm, i.e., under [Al]: 

Vt G S m , go\\t\\ 2 < E(\\t\\ 2 n ) = J t 2 (x)g(x)dx < 9l \\t\\ 2 . 

Then, the mean-square contrast defined by (2.8) can be decomposed as fol- 
lows: 

T? S it) ~ 7„ MS (-s) = \\t- F\\l -\\s- F\\l 2^ s (t s) (4.1) 
where (.) is defined by: 

1 - 

^ s (t) = -j2( s i- F (Ui))m) (4.2) 

n L — ' 

i=i 

which is a centered process since E(5|{7 = it) = F(u). 

In this case, we obtain the following result for the penalized estimator: 

Theorem 4.1. Consider the collections of models [T] with N n < \fn/hi{n) or 
[DP] or [W] with N n < n/ln 2 (n). Let F Ao be defined by (2.10), with 

pcn MS (m) > Ko — ■ 
n 

Then, 

H\\Frn - F\\ 2 n ) < C inf (\\F m -F\\ 2 n + pcn A/s (m)) + C- (4.3) 

where F m stands for the orthogonal projection of F on S m and C and C are 
constants depending on <3?o and g. 
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Fig 2. Plot of 15 Regression estimators F^ for Model 4 with n = 500. 

Note that the computation of the estimator may be more tedious in practice 
than the quotient one, but the result is obtained directly for the estimator of F, 
without any regularity condition on tp. As a consequence, we obtain here a rate 
only depending on the regularity of F, and we can state the following result: 

Proposition 4.1. Consider the collection of models [T] with a F > 1/2 or [DP] 

or [W], with r > a F > and the estimator F mo defined by (2.10)-(2.11). Let 
L > 0. Then 

1 

( sup E\\F ~ hjHY <C(a F ,L)n-^ (4.4) 

V FGB QF , 2i00 (L) ' 

where B QF , 2 ,oo(£) = {t £ #a F ,2,oo([0, 1]), \t\ aF ,2 < L} where C(a F ,L) is a 
constant depending on a Fl L and also on F, $o and g^. 

5. Simulations 

We consider the regular collection [DP] (see Section 2.3) with degrees less than 
r max on a subdivision [j/2 p , (j + 1)/2 P [. The density and regression algorithms 
minimize the contrast and select the approximation space in the sense that the 
integers p and r are selected such that 2 p (r + 1) < N n < n/log 2 (n) and r g 
{0,1,..., r max }- Note that the degree r is global in the sense that it is the same 
on all the intervals of the subdivision. We take r max = 9 in practice. Moreover, 
additive (but negligible) correcting terms are classically involved in the penalty 
(see Comte and Rozenholc (2004)). Such terms avoid under-penalization and 
are in accordance with the fact that the theorems provide lower bounds for 
the penalty. As the correcting terms are asymptotically negligible, they do not 
affect the rate of convergence. The constants in the penalty are taken equal to 4. 
Finally, for m = (p, r), the penalties are proportional to 2 p (r + 1 + log 2 ' 5 (r + 1)) 
with proportionality factor k = 4 for the estimation of g and F and a factor 
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Table 1 

Monte-Carlo results for the MSE (xl0~ 2 ) of the quotient, regression and the NPMLE 
estimators of the cdf F, for J = 200 or 500 sample replications. 



Quotient est. Regression est. 



71 




60 


200 


500 


1000 


60 


200 


500 1000 


model 


1 


1.75 


0.49 


0.16 


0.08 


0.50 


0.13 


0.05 0.03 


model 


2 


2.73 


0.83 


0.38 


0.22 


10.3 


0.89 


0.23 0.01 


model 


3 


1.99 


0.64 


0.28 


0.09 


1.13 


0.39 


0.07 0.03 


model 


4 


5.96 


3.99 


1.86 


0.36 


7.40 


2.40 


0.48 0.19 


model 


5 


1.89 


0.79 


0.37 


0.18 


0.76 


0.27 


0.11 0.07 








Birge's NPMLE 




Groeneboom's NPMLE 


71 




60 


200 


500 


1000 


60 


200 


500 1000 


model 


1 


1.72 


0.75 


0.41 


0.25 


1.94 


0.75 


0.37 0.24 


model 


2 


2.15 


0.75 


0.51 


0.27 


1.97 


0.76 


0.40 0.23 


model 


3 


1.52 


0.76 


0.39 


0.25 


2.15 


0.80 


0.40 0.22 


model 


4 


2.90 


1.11 


0.66 


0.38 


2.04 


0.82 


0.43 0.26 


model 




1.20 


0.93 


0.32 


0.27 


0.93 


0.38 


0.20 0.12 



(4/n) Y^7=i f° r the estimation of ip. Most programs are available on Yves 
Rozenholc's web page http://www.math-info.univ-paris5.fr/~rozcn/. 

Now, let us describe the simulated models. Remember that the distribution 
of (5 given U = u is a Bernoulli variable with parameter F(u). We consider the 
following models for generating data: 

Model 1. Uniform distribution F: U ~ U(0, 1) and 8 ~ B(l, U) 
Model 2. x 2 -distribution F: U ~ U(0, 1) and 8 ~ B(1,F^(U)) 
Model 3. Quadratic distribution F: U ~ U(0, 1) and 8 ~ B(l, U 2 ) 
Model 4. Exponential distribution F: U ~ 7(1, A) and 8 ~ #(1,1 — e~^ u ) with 
A = = 0.5. 

Model 5. Beta distribution (S-shape) F:U ~ /3(4, 6) and 8 ~ B(l, ^(^(J/)) where 
Fp(a,p) is the cdf of a Beta distribution of parameter (a, f3). 

To study the quality of each estimation procedure and to compare them, we 
compute over J sample replications of size n = 60, 200, 500 and 1000 the mean 
squared errors (MSE) over the sample points ui, . . . , uk falling in [a, b}: 

MSE, = ^^f>K)-F,K)] 2 
fe=i 

where Fj stands for the (adaptive) quotient estimator F or for the penalized 
regression estimator F mo or the benchmark NPMLEs, computed over the jth 
sample replication for j = 1, . . . , J. For the small sample sizes n — 60 and 
n = 200, the average values are obtained with J = 500 repetitions while for 
large samples (n = 500 and n = 1000), only J = 200 replications are performed. 
To avoid boundary effects due to the sparsity of the observations at the end of 
the interval particularly for models 2, 4, and 5 the MSEj's are truncated for 
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each replication in the sense that we include in the mean only the Uk less than a 
given quantile value: ¥{X < 1) = 0.68 for model 2, F(X < 1) = 0.86 for model 
4 and ¥(X < 0.5) = 0.89 for model 5; thus, the MSE^- are computed over [a, b] 
with a = and 6=1 from model 1 to 4, and b = 0.5 for model 5. Therefore, the 
MSE's given in Table 1 stand effectively for the truncated arithmetic means of 
the MSE/s. 

As we can see from results in Table 1, the regression estimator is always better 
than the quotient and the NPMLE estimators for large samples. However, for 
small sample sizes, the quotient estimator can behave as well as and even better 
than the regression one, see models 2 and 4 for n = 60, 200. The same remark 
holds for both Birge's and Groeneboom's NPMLE. These last estimators have 
the advantage to be very easy to compute. As a counterpart they look like step 
functions whatever the regularity of the function is. Moreover, Birge's estimator 
is not adaptive since we have to choose the number of partition cells (D = 5 cells 
for a sample size n = 60,200 and D — 10 cells for n = 500, 1000), see Figure 
3. Note also that, the density estimator g of g is a very attractive estimator 
by itself as shown in Figure 1. In some cases and particularly for model 4, see 
Figure 1, the quotient mechanism works wrong even if the density estimator is 
very performing. Figure 1 (right) shows that near than half of the curves do 
not give the good shape. This is a drawback of quotient strategies which do not 
have good robustness properties. Regression estimators (sec Figure 2) are much 
more stable. In Figure 3, we give an illustration of all the compared estimators 
for small (n = 60) and large (n = 1000) samples and we can see that our adap- 
tive estimators behave as well as and often better than the benchmark NPMLEs. 

Concluding remarks. Globally it appears that the regression estimation is 
better than the quotient estimator, from both theoretical and empirical points 
of view. The latter can be better than the former only for small sample exper- 
iments. The two density estimators involved in the quotient are nevertheless 
easy to compute, and empirically very good. It is thus interesting to see that 
the estimation algorithms give very good results. Nevertheless, even for well 
estimated numerator and denominator, the ratio is less satisfactory than the 
direct regression estimator. 

6. Proofs 

6.1. Talagrand's Inequality 

The following version of Talagrand's Inequality (see Talagrand (1996)) is very 
useful in most of the proofs: 

Lemma 6.1. Let Z\, . . . , Z n be i.i.d. random variables and v n {t) be de-fined by 
v n {(-) = (l/ n ) Y^i=i[£(Zi) ~ E(€(Zi))] for i belonging to a countable class C of 
uniformly bounded measurable functions. Then there exists a universal constant 
Kq such that, for any positive r\, A 
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Fig 3. True cdf (black thick line), Regression estimator (red), Quotient estimator (green), 
Birge's NPMLE (blue) and Groeneboom's NPMLE (magenta) for Model 1 to 5 (top to bottom) 
with n = 60 (left) and n = 1000 (right). 
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(6.1) 



P(sup|i/„(f)| > {l + r))H + A) < 3exp 

tec 



/A 2 (r/Al)A 
v Mi 



Moreover, there exists a universal constant Ki such that for e > 



supK(£)| 2 -2(l+2e)iP 

eec 



6 {V K f ™ff 2 8M? KiCMvg nH 

< — {-e~ Kie —+ - e ^ M i 



Ki \n 



Km 2 C 2 (e) 



(6.2) 



with C(e) = (\/T+e — 1) A 1, and where 



sup pHao < M x , El sup |i/„(0| < sup Var(i(X0) < «. 

Note that (6.1) is also given in Birge and Massart (1998), Corollary 2. In 
both cases, usual density arguments allow to take instead of the class C a unit 
ball in a finite dimension space of functions. 



6.2. Proof of Lemma 3.1 

Let us write 

- g\\oo < 11.9 - S>mJoo + hrhg - Smjoc 

with g = g,h g defined by (2.4). As g belongs to some Besov space S Qsi 2,oo([0, 1]) 
with a g > 1/2 and as $a s ,2,oo([0, 1]) C $a s -i/2,oo,oo([0, 1]) then, Lemma 12 in 
Barron et al. (1999) gives (with the restriction D m > log(n), Vm): 

11.9 - ffrfjoo < CD-^-W < C(logn)-^~^\ 

Thus, II q 

~ 9m g | |oo decreases to as n goes to oo and for some integer no large 
enough, we have for n > no, 

P(llfl - 9\\oo > 90/ '2) < F(\\9m g - 9m 9 \\oo > 90/ 4) 

Now, ||g Ag - .9™, | |oo < $ov/AhJSm s - <kj and || ffAs - g^J 2 = 
EagA^ ^(^a) = sup teB Wf hg (t)\. This implies 



,9o 



"(HS-slloo > <?o/2) < P sup \u n<g (t)\ > 



< F ( sup WnM> I^7rr) 



(6.3) 



We apply Inequality (6.1) to the class of functions C = B m (Q, 1) by taking 
^ = i-E(i(J7 1 )), with 

Sup HtHoo < $oVD^~ M U 

teB m (o,i) , 

sup Var(i([/i)) < sup / t 2 (u)g(u)du < g\ := v 
tes m (o,i) tes m (o,i)Jo 
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and E(su PteBm(04) i£ ff (t)) = (1/n) EasA™ Var(v> A (I/i)) < *gD m /n := H 2 . By 
choosing r\ = \ and A = go/ {&<&q\J D m ) and if 2_E + A < g /(A^ \/LT^), wc 
obtain from (6.3): 

P(||3-fl||oc>5o/2) < E 3ex P 

mG.M„ 

< ^ 3exp[-JiriCin/D m ] 

with Ci = (giff* 7 A 8%)' if we ensure that 2E + X < g / {^ ^fL\~ l ). But with 

E = &a^/D m /n, this is verified if D rn < [g^/ (\6&fy]y/n. Thus, we can deduce 
that 

HWg-gWoo > go/2) < 3|M n | exp [-K.CiV^ 

with C[ = A ^) /[go/(16*g)]. Finally, since |.M„| < n, if D ro < 

(-ftTiCi)n/(2 hi(n)) then P(||<? — <?||oo > <7o/2) < 3/n and this concludes the 
proof. □ 



K x n — A == 

\9i $oVD^J 



6.3. Proof of Theorem 3.1 

6.3.1. Proof of a preliminary Lemma 
First, we prove the following lemma: 

Lemma 6.2. Assume that {TL\) and (H2) are fulfilled and denote by -B m ,m'(0, 1) = 
{t 6 S m + S m i, \\t\\ = 1}. Let v n (lt) be defined by (3.5) and 

e t (u,6) = 5t(u), (6.4) 

then for e > 

E( sup u 2 n (£ t ) - p^m,m')) < * (V"^+^> + e ~*^ ) , 

J6.5) 

with p*(m, m') = 2(1 + 2e)<$>l J* tp(x)dx (D m + D m ,)/n and C(e) = (y/T+e - 
1) A 1. The constants Ki for i = 1, 2, 3 depend on <S>o, ip and F . 

We apply Talagrand's inequality (6.2) by taking Zi = (Ui,Si) for i = 1, . . . ,n 
and £(u,8) = £t(u,8). Usual density arguments show that this result can be 
applied to the class of functions £ = {it, t <E B rn . m >(0, 1)}. Then we find for the 
present empirical process the following bounds: 

sup||£|| 00 = sup IKtlU < $o\/D(m,m') := Ah 

tec f£-B m , m '(0,l) 
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with D(m, m') denoting the dimension of S m + S m '. Then 

supVar(^(J7iA)) = sup Var(^(J7i, 5i)) = sup E((5ii 2 (E/i)) 
eec *e-B m , m '(o,i) t es mm ,(o,i) 

1 

2/ 



sup / t (u)tp(u)du < g\ := v. 
teB m m /(o,i) Jo 



Lastly, 



Efsup = E( sup vl(l t )) < V ivax(^fp A (t^i)) 

n J n 
with the natural notation A m . m / = A m U A m <. Then it follows from (6.2) that 

E( sup vl{i t ) -p+(m,m'j) < m fl e -^(m') + _J_-n 3 e^\ 

where m for i = 1, 2, 3 are constant depending on _ftTi and Ci and p^(m, m') = 
2(l + 2e)C 1 (D m + D m ,)/n. □ 

O.jS. Proo/ of Theorem 3.1 

It follows from the definition of ip m that: Vra G At n , 

7^m) +pen (m) < >y$(tf m ) + pen^(m). (6.6) 

Then by using decomposition (3.6), it follows from (6.6) and from the definition 
of the process v n {(-t) given by (3.5) that: 

ll^m-^ll 2 < ||^n-^H 2 + 2^(^ i5j _, iA ) + pen^(m)-pcn , ''(m) 

< \\ip m - i>\\ 2 + -M m -^ m \\ 2 + A sup i/ 2 (£ t ) 
4 tes m ,^(o,i) 

+ pen^(m) — pen^(m) 

where we recall that B m ^ m (0, 1) = {t G <S OT + SVn / ||i|| < 1}. Note that the 
norm connection as described by (2.3) still holds for any element t of S m + S m < 
as follows: ||i||oo < $o max(D m , -D TO <)||t||. Indeed, under (H2), we restrict our 
attention to nested collections of models, so that S m + S m is equal to the larger 
of the two spaces. For a fixed integer m, we denote by D{m!) the dimension of 
S m + S m >, for all m! G M n . Note that D{m') = max(D m , D m i) < D m + D m i. 
Let p^(m, m') be defined as in Lemma 6.2. Then Vm G A4 n , 

lU m ~M\ 2 < ^||^-^ m || 2 + 2pen^(m) + 8( sup -p*(m,m)) 
^ ^ v tes m , A (o,i) 7 

+ 8p^(m,rh) + pen(m) — pen(r7i). 
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Now, note first that 

E( sup ul(£ t )-p^(m,m)) < V e( sup vl{t t )-p^(m,m')) . 

V *6B m ,™(0,l) J + m 7^ M V *6B m , m '(04) J + 

(6.7) 

Moreover it follows from Lemma 6.2 that 
V e( sup ul(£ t )-p^m,m')) < «i ( ^ + J^i c -" i/a ^ 



m'&M 



where S(m') = Sm'eA-i e K2eD ( m \ Then by taking e = 1/2 and assuming 

that \Mn\ < n and since" under (H 2 ), E m eM n e ~ a ° m < ELie^" < s ( a ) < 
+oo,Va > 0, this leads to the bound 



e( sup vl{l t ) -p^'im^m')) < 

V t6-B m ,ri«(0,l) ' + 



c 

n 



Therefore, we have the following result, which proves the theorem: Vm £ M n , 

EfU^-tPW 2 ) < 3||V'-V'm|| 2 +4pcn' / '(m)+-+2E (8p^(m, m) - pen(m) - pen(m)) 

n 

Therefore by using the definition of p^(m,m') in Lemma 6.2, we choose 

f 1 D 
pen* (m) > 16(1 + 2e) / ^{x)dx — . 

Jo n 

This ensures that Vm, m', 8p^(m, ml) < pen(m) +pen(m') and yields to (6.7).D 

6.4. Proof of Theorem 3.2 

We start by writing that, Vm € M n , 

7n(Frh ) + pen MS (m ) < ln {F m ) + pcn MS (m) 
and by using the decomposition (4.1). It follows that 

\\F Aa - F\\l < \\F m - F\\l + 2v™ s (F iho - F m ) + pen MS (m) - pen MS (m ). 

In the same way as Baraud et al. (2001), we introduce for ||t||^ = t 2 (u)g(u) du, 
the ball B 9 mm ,{Q, !) = {*€ S m + S m ,,\\t\\ g = 1} and the set 



n. 





PI 


2 




n l 


\\t 


2 

g 



<\, Vte (J (5 m + 5 m 0\{0}|. 



rn.m, 1 £A4 n 

On the complement of il n , a separate study leads to the following lemma: 
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Lemma 6.3. If N n < y/n~/\n(n) for [T] or N n < n/\n(n) for [P] or [W], 
then P(f2^) < c/n and, Edli^o — -^lln^n^) — c ' '/ n > where c and c' are positive 
constants. 

Proof of Lemma 6.3. That P(f2^) < c/n 2 is in fact a pure property of the basis 
and is proved under our assumptions in Baraud. (2002). Moreover, ||i^fi — -F||„ < 
mFrhJl + \\F\\l). Now \\Ff n < 1 and \\FrnJl = (l/n)||n Ao 5|||„ where 
S = (Si, . . . , S n ), U 7 % is the orthogonal projection in R™ on {t(Ui), . . . , t(U n )), 
t e S m } and || • | Inn is the Euclidean norm in R n . It follows that ||^f, ||^ < 
(l/n)||*||| B = (l/n)Er=^? < 1- Therefore E(||F Ao - F\\ 2 n l Qg ) < 2P(^) < 
c'/n. □ 

Therefore, we focus on the study of the bounds on fi n , where the inequality 
||t||* < 2\\t\\l is fulfilled. We obtain 

\\Frn -F\\ 2 n Ia n < \\F m -F\\l + hF fno -F m \\}ln n +16 sup K s ] 2 {t) 

8 tG-B? (0,1) 

+ pen MS (m)-pen MS (7f l0 ) 
< (l + \) \\F m - Ff n + i||F Ao - F\\ 2 Jn n 
+ 16( sup [v* IS ] 2 (t)-p(m,rh )) 

V tGB? (0,1) 7 + 

+ pen MS (m) + 16p(m,rho) — pen M (mo). 

Let (<p\)\eA m , be an orthonormal basis of S m + S m i for the scalar product 
(■, -} g (built by Gramm-Schmidt orthonormalization) . It is easy to sec that: 



( sup K S ]\t)) < J2 -Varf[<5 1 -F([/ 1 )]^ A ([/ 1 )) 

< Yj -^x( f [lx<u- F(u)] 2 ^ x (u) 2 g(u)du\ 

< - V ff E x [Ix<u - F{u)] 2 Cp\(u)g(u)du\ 
^ - E { I F{u){l-F{u))(pl{u)g{u)du 

71 , = A . Wo 



< 



D m V D m , 



n 

as F(u)(l — F(u)) < 1. Therefore, we obtain by applying Talagrand's Inequality 
V E( sup [i^ s ]\t)-p(m,mf)) <- 
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with 

p(m,m ) = 4 := in , 

n 

sup Var[(<5i - F(Z7i))i(J7i)] < sup E(t 2 ([/i)) = 1 := u, 
teB g m , m (o,i) tes^, m (o,i) 

and sup - J F(t/i))t|| 00 < sup < $o\ZA n ,m'/.9o := Mi. 

teB g , (o,i) teB a , (0,1) 
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